Week 04: Bottom-Up Parsing I & II

Predictive Parser: LL(1)

Predictive Parsing

  • Concept
    • Look at the next? tokens.
    • No backtracking.
    • Accept LL(K) grammars. (Left-to-Right, Leftmost derivation, k tokens)
    • At each step, only one choice.
    • To avoid ambiguousness, need to left-factor the grammar.
  • Parsing Table
    • Leftmode Non-terminal x Next Input Token
    • Use stack to record frontier of parse tree
  • Differnce from Recursive Descent
    • For the leftmost non-terminal S
    • Look at the next input token a
    • Choose the production shown at [S,a]
  • Reject on reaching error state
  • Accept on (end of input && empty stack)

First Set

Def. (First Set)

$$First(X)=\{t\ |\ X\rightarrow^*t\alpha\}\cup\{\epsilon\ |\ X\rightarrow^*\epsilon\}$$

Algo

  1. $First(X)=\{t\}, if\ t\ is\ a\ terminal.$
  2. $\epsilon \in First(X), if \begin{cases} X\rightarrow \epsilon \\ X\rightarrow A_1A_2\dots A_n, and\ \epsilon \in First(A_i), \forall 1\leq i\leq n \end{cases}$
  3. $First(\alpha) \subseteq First(X), if\ X\rightarrow A_1A_2\dots A_n\alpha, and\ \epsilon \in First(A_i), \forall 1\leq i\leq n$

Follow Set

Def. (Follow Set)

$$Follow(X)=\{t\ |\ S\rightarrow^*\beta Xt\delta\}$$

Algo

  1. $\$ \in Follow(S)$
  2. $First(\beta)-\{\epsilon\}\subseteq Follow(X), for\ each\ A\rightarrow\alpha X\beta$
  3. $Follow(A) \subseteq Follow(X), for\ each A\rightarrow\alpha X\beta\ where\ \epsilon\in First(\beta)$

How to Construct LL(1) Parsing Tables

  • Construct a parsing table T for CFG G
  • For each production $A \rightarrow \alpha$ in G do:
    • For each terminal $t \in First(\alpha)$, $T[A, t] = \alpha$
    • If $\epsilon \in First(\alpha),$, for each $t \in Follow(A)$, $T[A,t] = \alpha$
    • If $\epsilon \in First(\alpha)$ and $ \$ \in Follow(A)$, $T[A,\$] = \alpha$
  • If any entry is multiply defined, then G is not LL(1).

Bottom-Up Parsing

Shift-Reduce Parsing

  • Concept
    • Don't need left-factored grammars.
    • Reduce the string to the start symbol. (Inverting production)
    • A bottom-up parser traces a rightmost derivation in reverse. $$ \begin{align} & int*int + int \\ & \textbf{int*T} + int \\ & \textbf{T} + int \\ & T + \textbf{T} \\ & T + \textbf{E} \\ & \textbf{E} \end{align} $$
    • Thm.
      • In some step, let string as $\alpha \beta \gamma$.
      • Assume the next reduction is by $X \rightarrow \beta$. ($\alpha \beta \gamma \rightarrow \alpha X \gamma$)
      • Then $\gamma$ is a string which contains only terminals.
    • Split the string as $L_{Str} | R_{Str}$.
    • $L_{Str}$ contains non-terminals and terminals. $R_{Str}$ contains only terminals.
  • Actions
    • Shift: Move | one place to the right. Shift a terminal to the left string.
    • Reduce: Apply an inverse production at the right of $L_{str}$. $$ for\ A \rightarrow xy, Cbxy|ijk \Rightarrow CbA|ijk $$
  • Implementation
    • Use a stack to maintain $L_{Str}$.
  • If is's legal to shift or reduce, there is a shift-reduce conflict.
  • If is's legal to reduce by two productions, there is a reduce-reduce conflict.

Term. (Handle)

  • Scenario
    • Grammar
      • $E \rightarrow T+E|E$
      • $T \rightarrow int*T|int|(E)$
    • At the steop $int|*int+int$
    • If we reduce by ($T \rightarrow int$) given ($T | *int+int$), it will fail.
  • Target
    • Want to reduce only if the result can still be reduced to the start symbol.
  • A handle is the rhs of the production that you can trace back in the parse tree.
    • $\Rightarrow$ Handles are the children of some internal node of some AST.
  • Assume a rightmost derivation $$S \rightarrow^* \alpha X \omega \rightarrow \alpha\beta\omega$$
    • Then $\beta$ is a handle of $\alpha \beta \omega$.
  • Thm.
    • In shift-reduce parsing, handles appear only at the top of the stack, never inside.
  • Bottom-up parsing algorithms are based on recognizing handles

Recognizing Handles

  • There are no known efficient algo to recognize handles. But there are good heuristics.
  • Viable Prefix: $\alpha_1\dots\alpha_n$ is a viable prefix if there is an $\omega$ s.t. $\alpha_1\dots\alpha_n\ |\ \omega$ is a state of a shift-reduce parser.
    • A viable prefix is always the prefix of some handle.
    • If we can maintain the stack's contents are viable prefixes all the time, no parsing error will occur.
  • Thm. For any grammar, the set of viable prefixes is a regular language.
  • Item: An item is a production with '.' somewhere on the rhs.
    • Example. The items for the production rule $T \rightarrow (E)$ are
      • $T \rightarrow .(E)$
      • $T \rightarrow (.E)$
      • $T \rightarrow (E.)$
      • $T \rightarrow (E).$
    • Example. The items for ($X \rightarrow \epsilon$) are $X \rightarrow .$
    • Items are often called LR(0) items.
  • Target: To describe the states of the production rule.
    • If we need to use some production rule $R$ to reduce, the top of the stack will be the left string split by '.' on some item of $R$.
    • Example.
      • Input: $(int)$
      • Grammar
        • $E \rightarrow T+E\ |\ T$
        • $T \rightarrow int*T\ |\ int\ |\ (E)$
      • $(E$ is the left string split by '.' on the item $T \rightarrow (E.)$

Recognizing Viable Prefixes

  1. Add a dummy production $S' \rightarrow S$ to $G$
  2. The NFA states are the items of $G$
  3. For item $E \rightarrow \alpha .X\beta$ add transition: $(E \rightarrow \alpha . X \beta) \rightarrow^X (E \rightarrow \alpha X.\beta)$
  4. For item $E \rightarrow \alpha .X\beta$ and production $X \rightarrow \gamma$, add transition: $(E \rightarrow \alpha . X \beta) \rightarrow^\epsilon (X \rightarrow .\gamma)$
  5. Every state is an accepting state
  6. Start state is $S' \rightarrow .S$

Valid Items

  • Concept
    • Item $X \rightarrow \beta .\gamma$ is valid for a viable prefix $\alpha\beta$ if $$S' \rightarrow^* \alpha X\omega \rightarrow \alpha\beta\gamma\omega$$ by a right-most derivation.
    • An item $I$ is valid for a viable prefix $\alpha(\alpha_1\dots\alpha_n)$ if the DFA accepts $\alpha_1\dots\alpha_n$ and terminates on some state $s$ containing $I$.
    • The items in $s$ describe what the top of the item stack might be after reading input $\alpha_1\dots\alpha_n$.

LR(0) Parsing

  • Assume
    • Stack contains $\alpha_1\dots\alpha_k$
    • Next input is $t$
    • DFA on input $\alpha_1\dots\alpha_k$ teminates in state $s$
  • Action
    • Reduce by $X \rightarrow \beta$ if $s$ contains item $X \rightarrow \beta.$
    • Shift if $s$ contains item $X \rightarrow \beta .t\omega$
  • Conflict
    • Reduce/Reduce if any state contains two reduce rule.
    • Shift/Reduce if any state has a reduced item and a shift item

Simple LR Paring (SLR(1) Parsing)

  • Assume
    • Stack contains $\alpha_1\dots\alpha_k$
    • Next input is $t$
    • DFA on input $\alpha_1\dots\alpha_k$ teminates in state $s$
  • Action
    • Reduce by $X \rightarrow \beta$ if $s$ contains item $X \rightarrow \beta.$ and $t \in Follow(X)$
    • Shift if $s$ contains item $X \rightarrow \beta .t\omega$
  • If there are conflicts under these rules, the grammar is not SLR.
  • We can use precedence declarations to resolve conflicts. * is grater than +.

LR(1)

  • More powerful than SLR(1)
  • Build by (item x lookahead).
  • SLR(1) only checks whether lookahead is in follow set.

In [ ]: